'A  Thesis  submitted  i 
Rice  University. 


Histogram  Estimators  of  Bivariate  Densities1 
by 

Joyce  Ann  Stevens  Hiisemann 
Technical  Report  86-5,  April  1986 


partial  fulfillment  of  the  requirements  for  the  degree  of  Doctor  of  Philosophy, 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

APR  1986 

2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-1986  to  00-00-1986 

4.  TITLE  AND  SUBTITLE 

5a.  CONTRACT  NUMBER 

Histogram  Estimators  of  Bivariate  Densities 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Computational  and  Applied  Mathematics  Department  ,Rice 

University, 6100  Main  Street  MS  134, Houston, TX, 77005-1892 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 
ABSTRACT 

18.  NUMBER 

OF  PAGES 

110 

19a.  NAME  OF 
RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


acknowledgements 


I  wish  to  thank  my  advisor,  Dr.  George  Terrell,  for  his  generous  suggestions 
and  kind  encouragement;  Dr.  Robert  Stevens  for  the  important  grid  algorithm 
and  for  the  density  graphics;  Dr.  Hans  Muller  for  the  excellent  histogram 
graphics;  and  my  husband,  Michael,  for  many  hours  of  typing,  for  encourage¬ 
ment  and  understanding,  and  especially  for  demanding  that  I  obtain  my  Ph.D. 


This  research  -w'as  supported  in  part  by  the  Army  Research  Office  (Durham) 
and  by  the  Office  of  Naval  Research  under  Grant  No.  DAAG29-85-K-0212  and 
Grant  No.  N0001485-K-0100  respectively. 


abstract 


HISTOGRAM  ESTIMATORS  OF  BIVARIATE  DENSITIES 

by 

Joyce  Ann  Stevens  Hiisemann 

One-dimensional  fixed-interval  histogram  estimators  of  univariate  pro¬ 
bability  density  functions  are  less  efficient  than  the  analogous  variable- 
interval  estimators  which  are  constructed  from  intervals  whose  lengths  are 
determined  by  the  criterion  of  integrated  mean  squared  error  (IMSE) 
minimization.  Similarly,  two-dimensional  fixed-cell-size  histogram  estima¬ 
tors  of  bivariate  probability  density  functions  are  less  efficient  than  vari¬ 
able  cell  size  estimators  whose  cell  sizes  are  determined  from  IMSE  minim¬ 
ization.  Only  estimators  whose  cell  sides  are  parallel  to  the  coordinate  axes 

are  examined. 

The  estimators  are  classified  according  to  the  functional  dependence 
of  their  cell  dimensions  upon  x  and  y  :  each  cell  dimension  of  the 
Minimally  Restricted  Mesh  depends  upon  both  x  and  y  ;  one  cell  dimen¬ 
sion  of  the  Semi-fixed-dimension  Mesh  is  fixed,  and  the  other  depends 
upon  either  x  alone  or  y  alone;  one  cell  dimension  of  the  Variable- 
dimension  Mesh  I  depends  upon  x  and  the  other  upon  y;  one  cell  dimen- 


sion  of  the  Variable-dimension  Mesh  II  depends  upon  x  alone  or  y  alone 
and  the  other  depends  upon  both  x  and  y .  The  Minimally  Restricted  Mesh 
results  in  the  smallest  IMSE  of  the  four  types,  but  is  not  implementable. 
The  other  meshes  are  implementable  and  are  listed  above  in  order  of 
decreasing  IMSE.  Random  vectors  from  Dirichlet,  mixed  bivariate  and 
elliptical  bivariate  normal  distributions  were  generated  and  used  to  con¬ 
struct  optimal  histograms.  The  Variable-dimension  Mesh  II  produced  his¬ 
tograms  having  IMSEs  from  20  to  90  percent  smaller  than  those  from  his¬ 
tograms  based  upon  optimal  fixed-dimension  meshes.  The  most  substantial 
improvements  were  observed  for  mixed  bivariate  normal  densities  having 
strongly  unequal  variances.  Modest  improvements  (20%)  were  observed  for 
skewed  densities  and  slightly  elliptical  densities,  but  no  improvements 
were  observed  in  cases  of  highly  elliptical  densities  whose  axes  were 
rotated  45%  from  the  coordinate  axes. 
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(zjfe-i.  **].  A  histogram  estimate  is  constructed  by  placing  a  block  of  height 
/  along  each  such  interval. 

The  histogram  estimate  may  also  be  derived  in  the  following  way 
(Rosenblatt,  1965):  Since  the  probability  density  function  is  the  derivative  of 
the  cumulative  distribution  function,  we  may  estimate  the  density  /  by  a 
central  difference  approximation  to  the  derivative  of  F: 

?  F(x+h)-  F(x-h± 

f(x)~  2  h 


i/x+h)  i{x—h) 


n 
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2  h 


lAx+h.)  —  iAx—h' 
2  nh 


l  n 
nh  lZ\ 


X  —  X: 


(1.1.1) 


where  K„  is  a  function  of  If  -1-  >  1.  A®  sample  point  I,-  lies 

outside  the  interval  (x-h,  x+h ]  and  K0  is  set  equal  to  zero,  i.e.,  the  point 

X  —  X{ 

does  not  contribute  to  the  estimate  of  /  at  x.  If  -  <  1  »  xi  lies 


X  —  X: 


within  the  interval  and  K0  is  set  equal  to  l/2  so  that  a,-  contributes  to 


1.  Introduction 


Non-parametric  density  estimation  refers  to  the  estimation  of  probabil¬ 
ity  density  functions  whose  general  form  as  well  as  parameters  are  unknown. 
Of  the  several  methods  of  non-parametric  probability  density  estimation,  we 
will  discuss  the  four  most  frequently  encountered  and,  for  our  purposes,  the 
four  most  instructive:  the  kernel,  the  k-nearest-neighbor,  the  series,  and  the 

histogram. 

1.1  Important  non-parametric  estimators  of  probability  density 

We  begin  with  a  random  sample  of  size  n  from  a  population  whose 
underlying  density  is  the  object  of  our  study.  Let  v(x)  be  the  number  of 
sample  points  having  values  less  than  or  equal  to  x.  Then  a  natural  approxi¬ 
mation  to  the  cumulative  distribution  function  F  is 

f=iM  . 

n 

Similarly,  a  natural  approximation  to  the  probability  density  function  /  is 

j  =  ^xk)  ~ 

n{xk  -  xjt-i) 

where  the  xk  are  points  defined  by  a  mesh  on  the  real  line  and  where  uk  = 
i{xk )  —  i\xi t_i)  is  the  number  of  sample  points  falling  in  the  kth  interval 
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’rough’  to  represent  /  adequately  since  it  will  reflect  random  fluctuations  in 
the  data.  It  is  apparent,  therefore,  that  in  both  types  of  estimation,  a  partic¬ 
ular  choice  of  h  may  not  be  optimal  throughout  the  entire  domain  of  support 
of  f .  In  addition,  an  optimal  choice  of  h  will  depend  upon  sample  size,  since 
a  smaller  sample  size  requires  a  larger  h  so  that  a  sequence  of  separate  peaks 
is  not  obtained;  on  the  other  hand,  a  larger  sample  size  will  accommodate  a 

smaller  h. 


A- 

Figure  1.1.  Above:  example  of  histogram  in  which  h  is  too  small  so  that  /  is 
too  rough  (variance  is  large  and  bias  is  small).  Below:  example  of  histogram 
in  which  h  is  too  large  so  that  /  is  too  smooth  (variance  is  small  and  bias  is 

large). 


The  k  nearest-neighbor  method  is  a  fixed-frequency  approach  (Breiman, 
et  al.,  1977)  in  which  the  number  of  sample  points  k  =  i\x  +h )  —  v(x  —  h ) 
falling  in  an  interval  is  fixed  and  the  sizes  of  the  intervals  {x-h,  x+h]  vary 
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the  estimate  f{x).  It  should  be  apparent  that  this  procedure  will  produce  a 
histogram  estimate  of  /  where  /  at  x  is  represented  by  the  height  of  the  his¬ 
togram  block  which  is  centered  at  x  and  of  width  2 h.  The  shape  of  the  his¬ 
togram  will  depend  upon  the  choice  of  h,  the  parameter  of  interval  width 
(also  called  window,  cell,  or  bin  size). 

The  kernel  method  is  a  straightforward  generalization  of  the  above,  see 

Parzen  (1962)  and  Rosenblatt  (1965).  If  one  inquires  whether  it  is  really 

desirable  that  every  point  falling  into  a  given  interval  contribute  equally  to 

the  estimate  there  when,  perhaps,  those  points  falling  near  the  boundaries  of 

the  interval  are  of  less  importance  to  the  estimate  at  x,  an  estimator  which 

weights  the  contribution  of  points  according  to  their  distance  from  x  recom- 

-z8 

X  — •  X •  1  2 

mends  itself.  Let  ~  =  z  and,  as  an  example,  let  K0  \Z2tt  & 

in  (1.1.1)  so  that  all  points  of  the  sample  contribute  to  the  estimate  at  x  but 
in  inverse  proportion  to  their  distances  from  x .  K0  is  called  the  kernel  func¬ 
tion  and  may  be  any  function  which  satisfies  conditions  which  insure  estima¬ 
tor  consistency. 

In  the  kernel  approach,  the  shape  of  the  estimate  depends  upon  the 
choice  of  kernel  and  of  ’smoothing  parameter’  h .  In  both  histogram  and  ker¬ 
nel  estimation,  h  may  be  considered  to  be  a  smoothing  parameter  since  for 
large  h,  /  will  appear  ’smoother’  than  when  h  is  small,  variance  is  large, 
bias  is  small  and  /  follows  /  more  closely.  If  h  is  too  small,  /  will  be  too 
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tfx. 


Use  of  this  particular  criterion  leads  to  the  following  definition  of  the  a 

a{  =  £  [&(*)] 


-  —  S^.*(**) 

nifc-i 


where  is  a  sample  from  / .  The  density  estimate  becomes: 
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g  - 
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We  observe  that  this  estimator  depends  upon  the  choice  of  series  and  upon 
the  number  of  terms  m  which  are  to  be  included  in  the  approximation.  Since 
the  series  is  a  global  estimator  with  adaptability  being  limited  to  the  selec¬ 
tion  of  an  optimal  m,  the  approach  does  not  readily  lend  itself  to 
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in  order  to  accommodate  the  constant  number  of  points.  The  estimate 


K 

nhk  .-1 


X  —  X; 


,  where  hk  is  the  radius  of  the  window  with  k 


neighbors,  depends  upon  the  choice  of  k  and  is  subject  to  less  random  varia¬ 
tion  (i.e.,  has  a  smaller  variance)  when  k  is  large  but  is  also  subject  to 
smaller  errors  due  to  averaging  (i.e.,  has  a  smaller  bias)  when  the  interval 
length  ,  and  therefore  k,  is  smaller.  Again,  the  optimal  choice  of  parameter, 
k  in  this  case,  may  not  be  constant  throughout  the  domain  of  support  of  /. 

Like  the  histogram  and  k- nearest-neighbor  methods,  the  series  method 
(fcencov,  1962)  may  also  be  interpreted  as  a  special  case  of  kernel  estimation 


if  Kn 


x  -  xk 


is  set  equal  to  (**)&(*)  Wltt  &  defined  as  follows. 


i-l 


The  density  function  /  can  be  considered  to  be  a  waveform  which  may  be 


approximated  by  a  series  of  orthonormal  basis  functions  such  as  Fourier 
series,  Legendre  polynomials,  Hermite  polynomials  and  others.  If  {<£,}  is  the 
selected  set  of  orthonormal  basis  functions,  then  the  density  /  is  approxi¬ 
mated  by: 


/(*)  -  E  «»&(*) 


»-i 


where  the  series  is  truncated  so  that  there  are  a  finite  number  of  terms  m 
and  where  the  a,-  are  approximated  by  a,-  which  are  determined  by  minimiz¬ 
ing  an  error  criterion  such  as 
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will  also  develop  some  practical  criteria  for  the  construction  of  optimal  histo¬ 
grams  in  applied  problems  in  which  the  form  of  the  underlying  density  is 
known  or  assumed.  The  problem  of  constructing  pure  data-based  histograms 
is  beyond  the  scope  of  this  paper. 
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modifications  designed  to  provide  optimality  within  particular  subregions  of 
the  domain  of  support. 

1.2  Choice  of  methods 

In  any  applied  problem,  the  choice  of  approach  depends  upon  practical 
as  well  as  theoretical  considerations.  We  have  selected  the  histogram  estima¬ 
tor  for  investigation,  because  it  is  conceptually  the  simplest,  the  easiest  to 
apply  and,  therefore,  the  most  commonly  used  of  all  probability  density  esti¬ 
mators. 

1.3  Research  objectives 

In  all  of  the  density  estimation  approaches  presented  above,  there  is  a 
recurring  problem  of  simultaneously  controlling  both  variance  and  bias 
through  judicious  choice  of  parameter  as  well  as  the  difficulty  created  by  the 
fact  that  the  optimal  parameter  in  one  region  of  the  domain  of  support  may 
very  well  not  be  optimal  in  another  region.  Both  of  these  questions  have 
been  addressed  for  probability  densities  which  are  functions  of  only  one  vari¬ 
able  (Scott,  1979,  1982),  (Scott  and  Terrell,  1983).  New  complications  arise 
when  densities  of  more  than  one  variable  are  considered  (Terrell,  1983,  1984, 
1986),  (Scott,  1985).  It  is  the  purpose  of  the  present  paper  to  address  the 
questions  of  parameter  choice  and  parameter  variability  in  the  case  of  histo¬ 
gram  estimators  of  bivariate  probability  density  functions.  Results  of  this 
research  should  provide  a  basis  for  future  research  in  higher  dimensions.  We 
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IMSE\f{x )]  =  f  Var[f{x)\  dx  +  J  Bias2{  f  {x)\  dx. 


The  optimal  fixed  interval  length  is  derived  as  follows.  Let  f'(x)  be 
square  Riemann  integrable  and  defined  over  the  entire  real  line.  Let  h  be 
the  length  of  each  interval  and  let  n  be  the  sample  size  so  that  h(n)-+0  as 
n~*oo  and  nk(n)- kx>  as  n-+-oo.  Let  u,(x)  be  the  number  of  points  falling  in 
the  ith  interval  {y0+ih,  y0+(i+l)h]  where  y,+*A-y*.  H  we  have  an 
independent  random  sample,  then  has  a  binomial  (n,  p,-)»  distribution 

where  p,-  is  the  probability  that  a  sample  point  will  fall  in  the  ith  interval, 

i.  e. , 

y,+h 

Pi  -  /  /(*) 

y. 


Then  the  histogram  estimator  of  /  at  x  becomes 

f(x)  =  x  e  (y0+*A,  yo+O'+i)/1] 

nh 

which  is  the  proportion  of  sample  points  falling  in  the  interval  divided  by  the 
size  of  the  interval. 

We  will  derive  an  expression  for  the  integrated  mean  squared  error 
beginning  with  the  bias  term  (c.f.,  Scott,  1986). 


0 


2.  Theory 

2.1  One-dimensional  estimation 

In  1979,  Scott  addressed  the  question  of  optimal  interval  length 
(binwidth)  for  histograms  which  were  constructed- to  approximate  the  proba¬ 
bility  density  function  of  one  random  variable  and  whose  interval  lengths 
remained  constant  throughout  the  domain  of  support.  The  integrated  mean 
squared  error  (discussed  below)  was  suggested  as  a  global  measure  of  histo¬ 
gram  error  and,  thereby,  introduced  a  rigorous  treatment  of  variance  and 
bias  into  the  context  of  histogram  estimation.  A  global  measure  is  preferred 
since  it  is  the  shape  of  the  density  that  is  of  interest,  so  that  an  optimal  his¬ 
togram  is  one  which  best  approximates  the  form  of  the  true  distribution. 

Let  /(x)  be  the  probability  density  function  at  a  point  x,  and  let  /(x) 
be  its  estimator.  Then  the  mean  squared  error  (MSE)  of  /(x)  is  defined  as: 

MSE\f{x)]=E[{  f{x)-f{x))2} 

-  E[  (  /(*)  -  E[f{x))  f  ]  +  [  E[f(x))  -  f(x)  ]2 
=  Var[f  (x)]  +  jBtas2[  /  (x)] 


so  that  the  integrated  mean  squared  error  (IMSE)  becomes 
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(2.1.1) 


where  ^  ,f2  >£3  >£4  e  (y»,  y»+^l- 


so  that  over  all  the  intervals  of  the  histogram  we  have 
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f  \ 


by  the  Riemann  integrability  of  f\x)  as  n^oo  and  nh  —  oo  .  Therefore 
the  integrated  mean  squared  error  becomes: 


IMSE 


TMu  IM* +  o(k2)  + 


Differentiating  the  above  expression  with  respect  to  h  and  setting  the  result 
equal  to  zero,  we  obtain  asymptotically 


0 


dx 


so  that 


When  the  optimal  constant  interval  length  h*  is  used,  the  following  minimal 
IMSE  will  be  obtained 


IMSE*  =  6  3  n  3 


The  problem  of  optimality  in  different  regions  of  the  domain  of  support 
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by  the  Riemann  integrability  of  [/'(y)J  as  n  — ►  oo  and  h  — ►  0  . 

Now,  considering  the  variance  term  in  the  expression  for  the  integrated 
mean  squared  error,  we  have 


Var\J{y) ]  = 


Var[u{), 

n2h2 


nPiQ-  -  Pi) 

n2h 2 
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77  f  /(*)  dx 
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where  Co  e  (y»  >  y«+M  • 


Then 
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/(&) 

nh(x) 


-/2«.) 
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and 


h3(x) 
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Therefore 


His) 

nh 


-  V(w  +  [mtf  -7-  -  [/'(^r 


(2.1.3) 


using  (2.1.1)  and  (2.1.2)  above. 

The  mean  squared  error  averaged  over  an  interval  will  no  longer  remain  con¬ 
stant  but  will  vary  according  to  the  size  and  location  of  the  interval.  There¬ 
fore,  for  a  particular  location  x  in  the  domain  of  support,  we  wish  to  obtain 
an  optimal  interval  length  h{x)  which  will  minimize  the  mean  squared  error 
averaged  over  that  particular  interval.  The  minimal  integrated  mean  squared 
error  will  be  obtained  only  when  the  entire  sequence  of  optimal  interval 


lengths  is  employed. 
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remains,  however. 

In  1982  and  1983,  Terrell  and  Scott  addressed  this  issue  for  probability 
densities  of  one  random  variable.  If  the  interval  lengths  are  allowed  to  vary, 
that  is,  if  h  becomes  a  function  of  x,  then  the  mean  squared  error  over  one 
interval  becomes 

y.+M*) 

I  Var  [/(»)]<!* 


y,+A(z) 

+  yTT  f  Bias2[f{x)}  dx  —  A  +  B 

M2)  y, 


where 


y,+h{x) 


A  = 


M2) 


I 


nh2{x) 


y> +*(*)  , 

/  f(z)ix  --/2K.) 
^  n 


+ 


h{x) 


M  ^  -  M  - 


'(2) 


.  y.+M*) 

-±—  f 

h{x) 


*(«)/(« -iAt.) 


nh2(x) 


h(x) 


^-A2(x)  -  ±/*(€.)  i(x) 
nh2(x)  n 
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That  k{*{n,x)  — ►  0  for  each  interval  i  as  n  — ►  oo  in  the  case  of  a  normal 
density  can  be  seen  from  the  following  argument.  Let  us  truncate  our  esti¬ 
mate  at  x0  and  xk  so  that  f{x)—0  for  x  <  x0,  and  x  >  xk. 

The  integrated  mean  squared  error 


IMSE  -  E 


/(/(*)-  H*)?dx\ 


y-oo 


Xk 

/(/(*)  -  /(*))2  dl 

+  E 

!  (/(*)  -  h*)?  dx 

x . 

V  J 

— OO 

k 

+  E 


oo 


/(/(*)-  /(*)r  * 

Xk 


becomes 


-  E 


Xk 


/(/(*)-/(*))*  ** 


X. 


oo 


+  J  [f(x)}2  dx  +  J{f{x)Y  dx 


Xk 


if  j  Xk  -  x0\  is  sufficiently  large  so  that  an  arbitrarily  small  number  of  points 
fall  outside  the  interval  {x0,xk\. 


For 
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Holding  x  constant  and  minimizing  the  mean  squared  error  (2.1.3)  over 
(t y{+h]  by  differentiating  with  respect  to  h  and  setting  the  result  equal  to 


zero,  we  have 


0 


fJA s) 

nh2(x) 


+  f  [/.'(fa)]2  [/.'(fa)]2 


/(€s) 

nh2{x) 


h{x) 


fW-iW 


/(&) 

n 

2_ 

3 

’/,'(&) 

>  « 

2  J_ 
2 

/.'(fa)]2 

for  a  particular  value  of  x.  We  assume  that  h*(x)  is  approximately  constant 
over  the  interval  (y,-  y,-- l-A]  and  substitute  A  (x)  into  (2.1.3)  above  to  obtain 
the  mean  squared  error  over  the  interval: 


MSE  (*.*+*]  = 


2 

3  3 

— n 
2 


/3(es) 


f  - 1 


-  V(f.) 

n 


and  the  integrated  means  squared  error: 


2MS£  = 


2 

f»  \E  1 


/'(&.■) 


f[/'(ea.)]2-i  [/%,•)] 


■A,-  +  O 


_1_ 
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variable  interval  case.  If  /(*)  is  the  normal  distribution  with  mean  equal  to 
zero  and  variance  equal  to  one, 


h  «  3.491  n  3  for  the  fixed  interval  case 


and 


2_  _J_ 

h  «  2.469  jx|  3  -e  6  -n  3  for  the  variable  interval  case. 
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/(l)“  \7ST 


— x 
2 


1  -  F(x)  ~ 


—I 

1 


Choose  xn  so  that  1  —  F(xn)  ~  — .  Then 
"  n 

n  =  V27T  xn  e  2 

and  the  optimal  asymptotic  interval  width 


as  n  — ►  oo  . 


6/K) 


N/'(*b)]2  j 
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OO 

Since  / /2(x)  rfx  — ►  0  as  n  — ►  oo 


and 


X  00 

j  [ /(*)  -  /(a:)]2  dx  —  J  [/(*)  -  /(x)]2  dx  ,  we  have 


7M5£ 


—  6 
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.2.  00 


—00 


2 

\ 

f  > 

l/(x)  J\x)y  dx  +  o  (max  h( 

2)  +  O 

1 

n 

by  the  Riemann  integrability  of  /(y)  and 


Comparison  of  the  minimal  obtainable  integrated  mean  squared  error  for  the 
fixed  and  variable  interval  cases  shows  that  the  minimal  IMSE  for  the  fixed 
interval  case  is  always  greater  than  or  equal  to  the  minimal  IMSE  for  the 
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Let  X={x,y)  be  a  bivariate  random  variable  with  joint  probability  den¬ 
sity  function  f(x,y).  If  the  domain  of  support  is  subdivided  into  rectangles  of 
the  form  (xtj  xi+g]X{yii  l U+h]  where  g> 0  and  h> 0  are  the  lengths  of  the 
sides,  a  histogram  estimator  of  f{x,y)  at  the  point  ( x,y )  may  be  defined  in 
analogy  with  the  univariate  case  as: 

f(x,y)  =  for  x  e  (x,-  x{+g},  y  e  (y,-,  y i+h] 

where  v(x,y)  is  the  number  of  sample  points  falling  in  the  rectangle  and 
where  g  and  h  may  be  constant,  functions  of  x{  or  y,-  alone,  or  functions  of 
both  Xi  and  y,-.  The  integrated  mean  squared  error  is  defined  as  before  and  in 
the  bivariate  case  becomes: 

IMSE\f{x,y)}  ==/  /  E[  (  /(x,y)  -  f{x,y)  f  ]  dxdy 


oo  oo 

I  J 

— OO  —00 


E[  f{x,y)  -  E[f{x,y)}2}  +  [  E\f{x,y)}  -  f{x,y)  ]5 


dxdy 


=  7  7  [  Var [/(rc,y)]  -I-  Bias2[f{x,y)  ]  ]  dxdy. 

— oo  — OO 


Once  again  v(x,y)  has  a  binomial  (n,p)  distribution  with  p  the  probability 
that  a  sample  point  (x,y)  lies  in  the  above  rectangle  centered  at 
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2.2  Two-dimensional  estimation 

In  our  discussion  of  the  one-dimensional  case  of  histogram  estimation  it 
was  apparent  that  there  were  only  two  possible  types  of  mesh  from  which  a 
histogram  might  be  constructed:  h  could  remain  constant  throughout  the 
domain  of  support  or  could  vary  according  to  some  criterion  of  optimality. 
However,  when  the  concept  of  histogram  estimation  is  extended  to  two 
dimensions,  the  number  of  possible  grid  types  becomes  infinite:  the  plane  may 
be  partitioned  into  sets  of  any  shape  as  long  as  the  sets  are  mutually 
exclusive  and  the  subdivision  is  exhaustive.  Since  most  histograms  based 
upon  arbitrary  partitions  would  not  be  implementable,  Terrell  (1983) 
confined  his  investigation  to  rectangular  meshes  with  cell  sides  parallel  to  the 
coordinate  axes.  Scott  (1985)  studied  a  number  of  other  mesh  types,  some 
having  tria.ngular  and  others  hexagonal  shapes,  but  found  that  hexagons 
resulted  in  only  slightly  improved  estimates  at  a  cost  of  some  difficulty  in 
implementation  and  that  regular  triangles,  which  were  also  difficult  to  imple¬ 
ment,  resulted  in  worse,  estimates  than  did  Terrell’s  variable-dimension  rec¬ 
tangles.  Terrell  also  demonstrated  that  a  partition  of  the  plane  by  a  rhom- 
boidal  mesh  produced  histogram  estimators  equivalent  to  those  produced  by 
a  rectangular  mesh.  In  view  of  the  above,  the  mesh  types  in  the  present 
paper  have  been  confined  to  rectangular  grids  having  cell  sides  of  variable 
length  and  width  parallel  to  the  coordinate  axes.  The  problem  of  grid  orien¬ 
tation  is  a  separate  topic  of  research  and  will  not  be  treated  here. 
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so  that  the  bias  over  one  rectangle  is 
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so  that 


y,+h 

P  “  I 


Vi 


x  ,+g 

J  f{x,y)  dxdy 

X, 


where  x  e  (x{,  x{+g]  and  y  e  (y,-,  j U+h] 
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by  the  integral  form  of  the  mean  value  theorem  where  each  ,  fc=l,..,6,  is  a 
particular  value  of  £*  for  some  x  and  where  each  fy*  ,  is  a  particu¬ 

lar  value  of  f,  for  some  y,  and  where  x  -  xf  >  0,  y  -  Vi  >  0  since 
x  €  (x„  x,-fy)  ,  y  €  (yft  y,+M-  The  above  then  becomes 
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where  a  —  (x,-f£(x—  xt)  ,  y,+^(y— y;))  an<^  ^  —  (x.d"£(x  x«)  »  y«+£(y  y«)) 
and  where  0  <  f  <  1,  by  the  mean  value  theorem. 
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and  g  ,h  —+  0  ,  we  have  by  the  Riemann  integrability  of  partial 
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The  variance  of  f(x,y)  is  : 
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ting  the  expression  above  equal  to  zero,  we  obtain  the  following  conditions  . 

r 

/  fv, 


-oo  -oo 


^L  +  jl 
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IL 


dx 

\  / 


dxdy  =  0 
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(2.2.2) 
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■=*-  +  A 


ngh 2  6 


*L 

dy 


dxdy  —  0 


K  we  solve  the  above  equations  simultaneously  for  g  and  h  and  choose  the 
rectangular  dimensions  accordingly,  we  will  obtain  the  minimum  integrated 
mean  squared  error.  Different  solutions  to  the  above  equations  are  obtained 
depending  upon  whether  g  and  h  are  constant,  functions  of  only  one  variable 
or  functions  of  both  variables,  each  case  reflecting  a  different  type  of  mesh. 


2.3  Minimally  restricted  (Free)  mesh 

If  g  and  h  are  functions  of  both  x  and  y ,  we  obtain  a  subdivision  of  the 
domain  of  support  by  rectangles  of  arbitrary  dimensions. 
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Thus  the  integrated  mean  squared  error  becomes  : 
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Now,  treating  g  and  h  as  if  they  were  continuously  varying  functions  in  x 


and  y,  we  take  the  derivative  with  respect  to  g  in  the  arbitrary  direction  rjx 
and  the  derivative  with  respect  to  h  in  the  arbitrary  direction  7?y  and  set- 
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which  yield  an  optimal  integrated  mean  squared  error 
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2.4  Fixed-dimension  (Regular)  mesh 

If  g  and  h  remain  constant  throughout  the  domain  of  support,  we 
obtain  a  subdivision  which  is  mutually  exclusive,  exhaustive,  and  easily 


implementable . 
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Figure  2.3.  Minimally  restricted  (Free)  mesh 


Unfortunately,  these  will  not  necessarily  be  either  mutually  exclusive  or 
exhaustive  if  we  try  to  specify  the  optimal  width  and  height  in  each  region  of 
the  plane.  Such  a  scheme  is  clearly  not  implementable  but  is  rather  of 
theoretical  interest  because  it  provides  a  lower  bound  for  the  integrated 
mean  squared  error  for  all  possible  rectangular  meshes  whoses  cell  sides  are 
parallel  to  the  coordinate  axes.  Since  r\t  and  r\y  are  arbitrary  functions  of  x 


and  y,  the  equations  (2.2.2)  hold  if  and  only  if 
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The  solution  to  these  equations  is  easily  obtained  as 
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Substitution  of  these  values  into  (2.2.1)  yields  the  minimal  integrated  mean 
squared  error  for  this  mesh  type  : 
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A  similar  argument  is  found  in  Nesames  (1980).  As  will  be  demonstrated  in 
later  sections,  more  efficient  yet  easily  implementable  mesh  types  may  be 
designed. 


2.5  Semi-fixed-dimension  (Semiregular)  mesh 

If  g  is  a  function  of  x,  and  h  remains  constant  throughout  the  domain 
of  support  of  / ,  we  obtain  a  partition  in  which,  for  example,  the  cell  widths 
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Figure  2.4.  Fixed-dimension  (Regular)  mesh 


The  histogram  estimator  produced  by  this  scheme,  however,  is  less  efficient 
relative  to  that  produced  by  the  minimally  restricted  mesh  than  others 
which  will  be  proposed  later.  When  g  and  h  are  both  constant,  equations 

(2.2.2)  become 
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with  solution 
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A  similar  semi-fixed-dimension  mesh  may  be  obtained  if  g  remains  constant 
and  h  becomes  a  function  of  y.  The  expression  for  g,  h{y),  and  the  IMSE 
are  identical  to  those  for  the  above  h ,  ^(x),  and  IMSE,  respectively,  except 

that  and  dx  and  dy  are  exchanged  throughout. 
dx  dy 

2.6  Variable-dimension  (Grid)  mesh  I:  g(x),  h(y ) 
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remain  constant  while  the  lengths  vary  in  order  to  accommodate  changes  in 
the  form  of  the  probability  density  function  in  different  regions  of  its  domain 
of  support.  This  scheme  produces  a  histogram  estimator  which  is  more 
efficient  than  the  fixed- dimension  mesh. 


Figure  2.5.  Semi-fixed-dimension  (Semiregular)  mesh 


With  the  above  assumptions,  equations  (2.2.2)  become 
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with  solution 
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If  f(x,y)  can  be  written  as  the  product  of  two  functions,  each  of  which  is  a 
function  of  only  one  variable,  i.e.  ,  if 

f(x,y)  =  r(x)  s{y) 


then  analytic  solutions  of  the  following  form  may  be  obtained  : 


yielding  an  optimal  integrated  mean  squared  error 
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If  g  is  a  function  of  x  and  h  is  a  function  of  y  ,  we  obtain  a  mesh 
which  is  adaptable  in  both  dimensions  to  the  form  of  the  density  function 
and  which  is  almost  as  easily  implemented  as  the  fixed- dimension  mesh. 


In  several  respects  this  type  of  mesh  is  optimal  since  it  combines  both  adap¬ 
tability  and  ease  of  implementation.  Marginal  histograms  as  well  as  histo¬ 
grams  along  any  strip  in  either  direction  may  easily  be  obtained.  Although 
each  such  histogram  is  not  itself  optimal,  the  set  of  all  such  histograms  so 
constructed  is  optimal  on  the  average.  Difficulties  arise,  however,  when  an 
attempt  is  made  to  solve  the  equations  deriving  from  (2.2.2)  under  the  above 
restrictions  on  g  and  h  : 


ng2{x)  _oo  h(v) 
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dx 


dy 


and 


(2.6.1) 
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Figure  2.7.  Variable-dimension  (Semigrid)  mesh  II:  g(x),h(x,y) 


An  analytic  solution  to  the  equations  deriving  from  (2.2.2)  under  the  present 
restrictions  is,  however,  readily  obtainable  as  : 
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In  the  general  case  when  f{x,y)  is  not  separable,  a  numerical  rather  than  an 
analytic  solution  has  been  obtained.  The  algorithm  will  be  treated  in  section 
3  of  this  paper. 


2.7  Variable-dimension  (Semigrid)  mesh  II:  <?(z),  Mx>y) 

If  g  is  a  function  of  x  and  h  is  a  function  of  both  x  and  y  ,  we  obtain  a 
mesh  which  performs  better  in  terms  of  the  integrated  mean  squared  error 
than  the  variable-dimensioned  mesh  described  in  section  2.6  but  at  a  cost  of 
considerable  difficulties  in  implementation. 
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3.  Implementation 
3.1  Introduction 

In  the  previous  section,  several  types  of  two-dimensional  meshes  were 
introduced  together  with  expressions  for  their  optimal  cell  dimensions  and  for 
the  expected  minimal  integrated  mean  squared  error.  Theoretically  optimal 
histograms  were  constructed  on  the  basis  of  these  expressions  using  a  variety 
of  probability  density  functions.  The  integrated  mean  squared  error  was  cal¬ 
culated  in  each  case  and  the  IMSEs  compared  in  terms  of  their  efficiencies 
relative  to  the  least  restrictive  mesh  previously  introduced  (the  mesh  in 
which  each  cell  dimension  is  a  function  of  both  x  and  y ).  No  optimal  histo¬ 
grams  were  constructed  for  the  latter  case  since,  as  noted  previously,  a 
completely  freely  varying  mesh  is  not  implementable.  Since  the  theoretical 
mean  squared  error  also  depends  upon  sample  size,  a  variety  of  sample  sizes 
(from  50  to  5000)  were  used  and  their  effect  upon  the  IMSE,  relative 
efficiency,  and  form  of  the  optimal  histogram  were  observed.  Finally,  simu¬ 
lated  data  were  used  to  test  several  of  the  theoretical  constructs. 

3.2  Cell  dimensions  g  and  h 

In  cases  where  the  bin  dimensions  g  and  h  remain  constant  throughout 
the  domain  of  support,  where  g  is  a  function  of  x  and  h  remains  constant, 
where  h  is  a  function  of  y  and  g  remains  constant,  where  g  is  a  function  of 
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IMSE  -  2-6 
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Since  for  each  of  the  mesh  types  above  we  are  minimizing  the  integrated 
mean  squared  error  under  increasingly  tight  constraints,  it  may  be  shown 
that  the  minimum  IMSE  becomes  larger  as  more  constraints  are  placed 
upon  the  mesh,  in  particular  : 

IMSE  (free)  <  IMSE  (semigrid)  <  IMSE  (grid)  <  IMSE  (semiregular) 
<  IMSE  (regular). 
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given  point  the  average  number  of  histogram  intervals  per  unit  distance 
along  a  particular  axis.  Solving  the  first  equation  for  G{x)  and  the  second 
equation  for  H{x)  ,  the  following  numerical  approach  may  be  constructed  : 

Let  H0  be  any  positive  real  number  between  10  28  and  1028.  Then  for 
n— where  N  is  a  positive  integer,  let 


and 


<?„(*) 


»I(*1  " 


00 


f  Hn{y)  f(x,y)  dy 


— oo 


1_ 

3 


(3.2.1) 


Hn+i(y) 


f  Gn{x)  f{x,y)  dx 

—  OO 


and  replace  Hn{y)  in  the  first  equation  by  Hn+l{y)  upon  each  iteration. 
When  the  analytic  forms  of  g{x)  and  f{y)  are  known,  as  in  the  case  of 
separable  f{x,y),  the  algorithm  was  observed  to  produce  convergence  to  the 
correct  value  with  four  place  accuracy  within  five  iterations.  That  conver¬ 
gence  should  occur  may  be  seen  from  the  following  argument. 


Let  equations  (3.2.1)  define  the  sequences 


\ 

\ 

•and  • 

Hn 

< 

and  let 
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x  alone  and  h  is  a  function  of  both  i  and  y,  or  where  g  is  a  function  of  both 
x  and  y  and  h  is  a  function  of  y  alone,  we  have  seen  that  analytic  solutions 
to  the  equations  for  optimal  binwidth  are  obtainable.  The  implementation  of 
optimal  histograms  on  the  basis  of  these  is  straightforward.  If  ^  is  a  function 
of  x  alone  and  h  is  a  function  of  y  alone,  the  analytic  solution  given  in  sec¬ 
tion  2.6  has  been  obtained  for  the  case  in  which  f{x,y)  may  be  written  as  a 
product  of  functions  r(x)  and  s{y).  Again,  implementation  is  straightforward. 
However,  in  the  general  case  in  which  f(x,y)  is  not  separable,  no  analytic 
solution  to  the  simultaneous  integral  equations  (2.6.1)  has  been  obtained. 
Rather,  a  numerical  procedure  involving  a  functional  iteration  was  used  to 
determine  the  optimal  <7(2)  and  h{y).  Since  the  solution  depends  upon  both 
the  functional  form  of  the  density  and  the  sample  size,  the  iteration  must  be 
performed  whenever  these  are  modified.  Let  the  equation  (2.6.1)  be  written 

in  the  following  form: 

\0 

/  H(y)f  (x,y)  dy  =  fG3(x)J  -|£  dy 

—00  —  00 

00  (  'l2 

/  G{x)f{x,y)  dx  =  jH3{y)  J  dx 

-00  0  -00 1  y ) 

where  G(x)=— ~  and  H{y)= -rf-r  so  that  G  and  H  may  each  be  inter- 
v  '  g{x)  h{y) 

preted  as  a  "  bin  density  function  "  ,  i.e.  ,  a  function  which  determines  at  a 
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Multiplying  equation  (3.2.2)  by  r(x)  and  integrating  yields 
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Similarly,  from  (3.2.3)  : 
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If  we  let 


45 


If  these  sequences  converge,  their  limits  will  satisfy  equations  (2.6.1).  If 
f{x,y )  =  r(x)  s(y),  we  have 
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so  that 
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Thus  on  any  interval  in  which  r(*)  and  s(y)  are  bounded  away  from  zero, 
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We  show  that  the  sequence  vn  converges.  By  (3.2.4)  and  (3.2.5), 
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j_  1 

v  =  lim  un+i  =  lim  Kxvn  9  =  Kxv9 

n— »oo  n— »oo 

i.  — 

so  that  V9  =KX  and  t ,  =  Kx8  .  Similarly,  the  sequence  (3.2.7)  converges 
9_ 

to  K%  • 

In  cases  where  f(x,y)  is  not  separable,  the  proof  of  convergence  depends 
upon  the  solution  of  difficult  cubic  equations  and  is  not  yet  available. 

3.3  Determination  of  cell  boundaries 

Whether  determined  analytically  or  numerically,  once  the  optimal  cell 
dimensions  v  and  w  have  been  obtained,  a  mesh  must  be  constructed  in  the 
plane.  The  direct  use  of  optimal  cell  dimensions  for  the  construction  of  this 
mesh  may  lead  to  serious  complications:  after  placing  the  first  cell  boundary 
in  one  of  the  dimensions,  the  center  of  the  next  adjacent  cell  must  be  deter¬ 
mined  by  solving  a  difficult  equation  which  may  not  even  have  a  unique 
solution.  An  alternative  method  suggested  in  1983  in  the  Terrell-Scott 
investigation  of  optimal  interval  width  for  one-dimensional  histograms  is  to 
integrate  the  bin  density  function  G{x )  or  H{y )  with  respect  to  i  or  y 
respectively,  to  place  a  boundary  at  the  point  along  the  particular  axis  at 
which  the  corresponding  bin  density  function  attains  the  value  one,  and  to 
repeat  the  process  beginning  each  integration  at  the  most  recently  deter¬ 
mined  boundary.  In  view  of  our  asymptotic  arguments  it  is  apparent  that 
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Wn+1  =  K2  '  Wn 


(3.2.7) 


If  V  >  Kx  ,  then 
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Thus  the  sequence  defined  by  (3.2.6)  is  monotonic  and  bounded  and,  there¬ 


fore,  convergent.  If  the  limit  is  denoted  by  v,  we  have 
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where  the  parameters  Mi,  M2,  ^3,  ^4,  °2,  °3,  °4,  P\,  Pi,  an<*  p  were  varie<^  *m 

order  to  test  our  methods  on  wide  variety  of  distributions. 

Based  upon  a  discussion  by  H.A.  David  (1981),  a  special  routine  was 
designed  for  the  generation  of  random  vectors  from  the  Dirichlet  distribution: 


/(*>  y) 


T(a  +  0  +  i)  o-i 

no)  m  nil 


1  (l-x-yr1 


(3.4.2) 


where  0<  x  <1  ,  0<  y  <1  ,  a,  /?,  7  >  0  ,  and  x  +  y  -  1  • 


If  n  uniform  (0,1)  random  variables  are  ordered  so  that 
0  <  ii(i)  <  tt(2)  <  ....  <ti(n)  <1  i  the  joint  density  of  «(f)  and  u{j)  ,  i  <  3  . 


is  given  by 

_ _ n! _ u  .  i 
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If  i  =  u(0  ,  y  =  u(y)  -  t*(0  ,  the  joint  density  of  x  and  y  is  given  by 

x'"1  yJ“,_1  (l-x-y)n"y 


ni 


(i—l)!  (j— •— 1)1  (n-i)l 


where  i  +  y  -  1  and  *,  j,  >  0  which  is  a  Dirichlet  distribution  with 
parameters  a  -  i  ,  /?  =  j  -  i  ,  and  7  -  *»  -  3  +  1  •  Therefore,  Diri- 
chlet(a  ,  j3 , 7)  random  vectors  may  be  obtained  by  generating  a  random 
sample  0fn=a  +  /?  +  7-  l  uniform  (0,1)  random  numbers,  ordering  them 

as  described  above,  and  setting  x  —  ,  y  w(a+/?)  U(Q) 
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the  initial  point  for  this  procedure  is  not  important  as  sample  size  increases. 
3.4  Heights  above  cells 

Having  constructed  an  optimal  mesh,  optimal  histograms  may  be  pro¬ 
duced  by  generating  n  random  vectors  ( x,y )  from  the  same  bivariate  distri¬ 
bution  which  was  used  to  construct  the  mesh  by  counting  the  number  of 
points  v  which  fall  in  each  cell,  and  by  forming  the  estimate  /,  the  height  of 
the  histogram  block  above  the  cell,  where  /  is  set  equal  to  v  divided  by  the 
product  of  n  with  the  area  of  the  cell. 

A  library  routine  was  used  together  with  other  code  for  the  generation  of 
random  vectors  from  a  bivariate  normal  density: 
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These  procedures  were  implemented,  and  the  resulting  IMSEs ,  efficiencies 
and  graphs  are  presented  in  Section  4. 
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3.5  Simulations 

Simulations  were  performed  in  which  an  optimal  mesh  was  obtained  for 
a  given  probability  density  function  /  and  a  given  sample  of  size  n,  followed 
by  the  generation  of  a  large  number  of  samples  from  / .  The  empirical 

imse\ 

_  .  Vj+hj 

iMSEa  =  E  E  J  I  (  -/(**»)  )2  » 

1  «-i  V]  2< 


where  n  is  the  number  of  interval  boundaries  in  the  x-direction  and  m  is  the 
number  of  interval  boundaries  in  the  y-direction,  was  calculated  for  each 
sample  and  the  average  empirical  IMSE  obtained.  A  comparison  of  these 
with  the  theoretical  IMSEs  is  presented  in  the  next  chapter. 
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4.  Results 

4.1  Hardware  and  Software 

The  algorithms  described  in  the  previous  chapter  were  coded  in  FOR¬ 
TRAN  77  and  implemented  on  a  VAX  780  and  subsequently  on  a  VAX  750 
and  a  UNIX  pyramid.  IMSL  subroutines  were  used  for  sorting  and  random 
vector  generation.  Histogram  graphics  were  produced  on  a  Macintosh  Apple 
personal  computer  and  an  Imagen  laser  printer  using  Pascal  programs  writ¬ 
ten  by  Hausi  Muller.  Probability  density  graphics  were  produced  by  the  Pre¬ 
cision  Visuals  DI-3000  software  package  and  Surface  Display  Library  on  a 
VAX  785  and  a  Versatec  plotter. 

4.2  Integrated  mean  squared  errors 

Circular,  elliptical  and  bimodal  bivariate  normal  distributions  and  Diri- 
chlet  distributions  were  used  to  test  the  concepts  presented  in  the  previous 
chapters.  Arrays  containing  the  bin  dimensions  g  and  h  were  calculated  for 
each  case,  the  resulting  bin  density  function  arrays  were  used  to  set  bin 
boundaries  in  the  plane,  random  vectors  sampled  from  the  appropriate  distri¬ 
butions  were  distributed  among  the  bins,  and  optimal  histograms  were  con¬ 
structed.  The  theoretical  integrated  mean  squared  error  was  calculated  for 
each  grid  type  for  each  distribution  and  the  IMSEs  compared  on  the  basis  of 
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their  efficiencies  relative  to  the  free  grid  for  each  particular  distribution. 
Simulations  to  test  the  effects  upon  real  data  were  also  run. 

Figures  4.2.1,  4.2.2,  4.2.3,  and  4.2.4  show  the  effect  of  increasing  sample 
size  upon  the  form  of  the  bivariate  histogram.  The  same  bivariate  density 
was  used  in  each  case.  The  effects  of  increasing  sample  size  upon  the 
integrated  mean  squared  error  may  be  found  in  Table  4.2.1. 


Figure  4.2.1.  Grid  histogram  based  upon  a  sample  size  of  50  from  the  bivari¬ 
ate  normal  density  3.4.1  with  p  =  .5,  P\  —  Pq  =  0»  =  <72  =  03  =  cr4  =  1,  M 1 

=  -1,  =  0,  /Z3  =  1,  /^4  =  0. 


Figure  4.2.2  Grid  histogram  based  upon  a  sample  size  of  500  (parameters 
same  as  in  Figure  4.2.1). 


Figure  4.2.3.  Grid  histogram  based  upon  a  sample  size  of  1000  (parameters 
same  as  in  Figure  4.2.1). 


Figure  4.2.4.  Grid  histogram  based  upon  a  sample  size  of  5000  (parameters 
same  as  in  Figure  4.2.1). 


Normal 

,71=a2=<T3=cr4=l  p=.5  ^2=0  /z3=l  /V=0 

Sample 

Size 

Theoretical 
Grid  IMSE 

Theoretical 
Free  IMSE 

Efficiency 
of  Grid  Mesh 

50 

11.75 

10.12 

.8613 

100 

8.310 

7.156 

.8611 

225 

5.354 

4.771 

.8911 

500 

3.592 

3.200 

.8909 

1000 

2.540 

2.263 

.8909 

1500 

2.074 

1.848 

.8910 

2000 

1.796 

1.600 

.8908 

5000 

1.136 

1.012 

.8908 

Table  4.2.1. 
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In  every  case  the  free  grid  performed  best  but,  as  noted  previously,  is 
not  implementable  and  is  of  theoretical  interest  only  because  it  provides  a 
lower  bound  on  the  integrated  mean  squared  error  achievable  by  these 
7  methods.  Also  in  every  case,  the  errors  for  the  regular  mesh  were  largest, 
those  for  the  semiregular  mesh  smaller,  those  for  the  grid  still  smaller,  and 
those  for  the  semigrid  smallest  for  the  implementable  grid  types.  The  fol¬ 
lowing  figures  illustrate  the  four  types  of  histogram  for  the  case  of  the  bivari¬ 
ate  circular  normal  density. 


Figure  4.2.9.  Semigrid  histogram  (density  as  in  Figure  4.2.5,  sample  size 

2000). 
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Four  sets  of  bimodal  normal  distributions  were  studied,  each  set  having 
a  different  distance  between  modes.  Within  each  set,  the  modes  were  first 
aligned  along  the  x-axis,  then  along  the  y-axis,  and  finally  along  the  line 
x  —  y  .  This  was  done  in  order  to  examine  the  difference  in  performance 
between  the  various  grid  types  when  presented  with  the  same  data  under 
several  rotations.  Figures  4.2.10  through  4.2.24  illustrate  the  case  of  a  mixed 
bivariate  unimodal  normal  density,  and  figures  4.2.25  through  4.2.39  illus- 


Figure  4.2.10.  Bivariate  normal  density  3.4.1  with  p  =  .5,  px  —  p2  “  ~ 

aw  =  cr3  =  a4  =  1,  Hi  =  -1,  Hi  —  ^3  =  ^4  —  °* 


Figure  4.2.12.  Sexniregular  histogram  (density  as  in  Figure  4.2.10,  sample 
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0.096' 


Figure  4.2.15.  Bivariate  normal  density  3.4.1  with  p  —  .5,  px  —  Pi  0, 
cr2  =  a3  =  cr4  =  1,  =  0,  p2  ~  M3  =  0,  //4  =  1. 
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OiJ36’ 


Figure  4.2.20.  Bivariate  normal  density  3.4.1  with  p  —  .5,  px  Pi  °>  °i 
cr2  —  cr3  =  cr4  =  1,  Vi  =  ^2  =  =  ^4  ”  *7071 
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Figure  4.2.25.  Bivariate  normal  density  3.4.1  with  p  =  .5,  px  —  p2  —  0,  <rx  — 
cr2  —  <j3  —  aA  =  1,  Hi  =  -1.5,  H2  =  0>  ^3  =  ^4  = 
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Figure  4.2.26.  Regular  histogram  (density  as  in  Figure  4.2.25,  sample  size  = 
2000). 


Figure  4.2.27.  Semiregular  histogram  (density  as  in  Figure  4.2.25,  sample  size 
=  2000). 


/Kmpl'tuP* 
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Figure  4.2.30.  Bivariate  normal  density  3.4.1  with  p  =  .5,  px  —  P2  —  0,  crx 
<72  —  °3  ~  °4  ~  =  0?  ^2  =  P'3  —  0>  Pa  ~ 


Figure  4.2.34.  Semigrid  histogram  (density  as  in  Figure  4.2.30,  sample  size  = 

2000). 
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Figure  4.2.35.  Bivariate  normal  density  3.4.1  with  p  =  .5,  px  =  p2  =  0,  cx  — 
a2  -  <r3  =  cr4  -  1,  =  P'2  =  -1-060,  P3  =  P4  =  1-060. 


Figure  4.2.39.  Semigrid  histogram  (density  as  in  Figure  4.2.35,  sample  size  = 
2000). 
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There  was,  of  course,  no  difference  in  performance  by  the  regular,  grid,  and 
free  meshes  when  alignment  was  made  along  either  the  x-  or  y-axes.  There 
were  differences,  however,  in  the  semiregular  and  semigrid  cases.  When 
bimodality  is  expressed  in  a  particular  direction,  the  adaptability  of  the 
model  in  that  direction  determines  the  relative  size  of  the  integrated  mean 
squared  error.  Because  of  the  structure  of  the  semiregular  grid  used  (in 
which  the  interval  widths  in  the  x-direction  were  variable  and  the  interval 
widths  in  the  y-direction  were  constant),  the  integrated  mean  squared  errors 
were  less  when  the  modes  were  aligned  along  the  x-axis  than  when  they  were 
aligned  along  the  y-axis.  Because  of  the  structure  of  the  semigrid  mesh  used 
(in  which  the  interval  widths  in  the  x-direction  were  variable  and  the  interval 
widths  in  the  y-direction  were  variable  within  each  strip),  the  integrated 
mean  squared  errors  were  greater  when  the  modes  were  aligned  along  the  x- 
axis  than  when  they  were  aligned  along  the  y-axis.  This  is  due  to  the  fact 
that  this  mesh  provides  greater  adaptability  in  the  y-direction  by  allowing 
bin  boundaries  to  be  set  independently  within  each  strip.  In  all  cases,  the 
integrated  mean  squared  errors  were  greater  when  the  modes  were  aligned 
along  the  line  x  =  y  .  This  occurs  because  of  the  basic  structure  of  all  of  the 
meshes:  cell  sides  are  parallel  to  the  x-  and  y-axes  in  all  cases.  The  IMSE 
results  and  efficiencies  relative  to  the  free  mesh  for  specific  cases  are  found  in 


Tables  4.2.3  and  4.2.4. 
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Bivariate  Normal  Distribution 

<j1—a2—o'3=crA=l  sample  size  =  2000 

Parameters  of  Normal  Distribution 

No  1 

mi 

mi 

mi 

SHU 

BPHI 

_ 

mi 

l 

0 

0 

0 

0 

0 

0 

mi 

.5 

0 

0 

-.5 

0 

.5 

0 

m 

.5 

0 

0 

0 

-.5 

0 

.5 

mi 

.5 

0 

0 

-.3536 

-.3536 

.3536 

.3536 

mi 

.5 

0 

0 

-1 

0 

1 

0 

m 

.5 

0 

0 

0 

-1 

0 

1 

.5 

0 

0 

-.7071 

-.7071 

.7071 

.7071 

8 

.5 

0 

0 

-1.5 

0 

1.5 

0 

9 

.5 

0 

0 

0 

-1.5 

0 

mi 

10 

.5 

0 

0 

-1.061 

-1.061 

1.061 

1.061 

11 

.5 

0 

0 

-2 

0 

2 

0 

12 

.5 

0 

0 

0 

-2 

0 

2 

13 

.5 

0 

0 

-1.414 

-1.414 

1.414 

1.414 

14 

mi 

.2 

0 

0 

0 

0 

15 

.5 

.5 

0 

0 

0 

0 

16 

wu 

wa 

m 

0 

0 

0 

0 

17 

.8 

.8 

0 

0 

0 

0 

18 

.5 

0 

0 

-.25 

0 

.25 

0 

19 

.5 

0 

0 

0 

-.25 

0 

.25 

20 

1  .5 

0 

0 

-.1768 

-.1768 

.1768 

.1768 

21 

.5 

0 

0 

-1.2 

0 

1.2 

0 

22 

.5 

0 

0 

0 

-1.2 

0 

1.2 

23 

.5 

0 

0 

-.8485 

-.8485 

.8485 

.8485 

cr,=.5  cr9=. 5  <r?=2  cr4=2  sample  size  =  20C 

10 

24 

.5 

0 

0 

-1.5 

0 

1.5 

1  o 

cr,=. 2  ct9=.2  ct9=3  <t4=3  sample  size  =  20C 

>0 

I  25 

.5 

0 

0 

-1.5 

0 

1.5 

0 

Table  4.2.2.  Identifying  numbers  and  density  parameters 
for  Tables  4.2.3  and  4.2.4. 


Bivariate  Normal  Distribution 


a-1=a2=cr3=a4=l  sample  size  =  2000 


Theoretical  IMSE  10 


-3 


Regular 


Semireg. 


Grid 


Semigrid 


Free 


3.643 


3.336 


3.054 


2.840 


2.643 


3.229 


2.947 


2.699 


2.509 


2.334 


3.229 


2.958 


2.699 


2.509 


2.334 


3.242 


3.014 


2.802 


2.612 


2.401 


2.484 


1.961 


1.796 


1.785 


1.600 


2.484 


2.274 


1.796 


1.776 


1.600 


2.576 


2.491 


2.410 


2.226 


2.027 


2.355 


2.090 


1.915 


1.802 


1.681 


2.355 


2.156 


1.915 


1.772 


1.681 


2.399 


2.290 


2.182 


2.007 


1.816 


2.498 


2.281 


2.089 


1.939 


1.830 


2.498 


2.287 


2.089 


1.904 


1.830 


2.504 


2.314 


2.139 


1.966 


1.812 


3.757 


3.531 


3.321 


3.068 


2.983 


4.519 


4.421 


4.329 


3.909 


3.645 


6.035 


5.999 


5.966 


5.369 


5.076 


7.837 


7.811 


7.790 


7.023 


6.467 


3.531 


3.233 


2.963 


2.755 


2.413 


3.531 


3.233 


2.963 


2.755 


2.422 


3.533 


3.247 


2.985 


2.788 


2.428 


2.343 


1.880 


1.722 


1.552 


1.036 


2.343 


2.147 


1.722 


1.598 


1.171 


2.437 


2.366 


2.296 


2.116 


1.732 


5.539 


4.928 


4.416 


3.763 


Bivariate  Normal  Distribution 
cr1=ij2=a3=a4=l  sample  size  =  2000 


Theoretical  Efficiencies  relative  to  Free  Grid 


No. 

Regular 

Semireg. 

Grid 

Semigrid 

Free 

1 

.7255 

.7923 

.8654 

.  .9306 

1.000 

2 

.7228 

.7920 

.8647 

.9303 

1.000 

3 

.7228 

.7890 

.8647 

.9303 

1.000 

4 

.7406 

.7966 

.8569 

.9192 

1.000 

5 

.6441 

.8159 

.8909 

.8963 

1.000 

6 

.6441 

.7036 

.8909 

.8502 

1.000 

7 

.7869 

.8137 

.8411 

.9106 

1.000 

8 

.7138 

.8043 

.8778 

.9329 

1.000 

9 

.7138 

.7797 

.8778 

.9486 

1.000 

10 

.7570 

.7930 

.8323 

.9048 

1.000 

11 

.7326 

.8023 

.8760 

.9438 

1.000 

12 

.7326 

.8002 

.8760 

.9611 

1.000 

13 

.7236 

.7831 

.8471 

.9217 

1.000 

14 

.7940 

.8448 

.8982 

.9723 

1.000 

15 

.8066 

.8245 

.8420 

.9325 

1.000 

16 

.8411 

.8461 

.8508 

.9454 

1.000 

17 

.8252 

.8279 

.8302 

.9208 

1.000 

18 

.6834 

.7464 

.8144 

.8759 

1.000 

19 

.6859 

.7491 

.8174 

.8791 

1.000 

20 

.6872 

.7478 

.8134 

.8709 

1.000 

21 

.4422 

.5510 

.6016 

.6675 

1.000 

22 

.4998 

.5454 

.6800 

.7328 

1.000 

23 

.7107 

.7320 

.7544 

.8185 

1.000 

24 

.5226 

.6794 

.7636 

.8521 

1.000 

25 

.4624 

.6607 

.7828 

.8829 

1.000 

Table  4.2.4 
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The  integrated  mean  squared  error  of  the  elliptical  bivariate  normal 
distribution  is  quite  similar  to  that  of  the  circular  normal  when 
Pl  =  p2  =  0.2  but  as  ellipticity  increases,  the  error  increases  rapidly  and  is 
tripled  at  p1  =  p2  —  0.8  .  The  differences  in  adaptability  between  the 
different  mesh  types  nearly  disappears  at  greater  ellipticities,  because  all  of 
the  meshes  consist  of  line  segments  which  are  parallel  to  the  axes.  The  ability 
to  adapt  to  highly  elliptical  distributions  is  limited  primarily  to  decreasing 
cell  size,  so  that  not  only  is  the  error  similar  for  each  grid  type,  but  also  the 
forms  of  the  optimal  histogram  are  quite  similar.  Figures  4.2.41  and  4.2.42 
demonstrate  the  similarity  of  form. 


Amplitude 


0.185! 


Figure  4.2.40.  Bivariate  normal  density  3.4.1  with  p  =  1,  P\  —  -5,  crx  —  <x2 
=  <?4  —  lj  Mi  =  f^2  =  ^3  “  A*4  " 
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Further  improvement  can  be  made  only  by  the  rotation  of  the  entire  mesh  so 
that  the  cell  sides  become  parallel  to  the  major  axis  of  an  elliptical  distribu¬ 
tion  or  to  the  line  joining  the  modes  of  a  bimodal  distribution.  Terrell 
(1984)  provided  a  solution  to  this  problem  of  orientation  in  the  case  of  a  reg¬ 
ular  mesh.  He  defined  an  "information  matrix" 


h 


2L 

dx 


and  showed  that  the  integrated  mean  squared  error  of  a  histogram  based 
upon  a  regular  mesh  is  minimal  when  the  product  of  the  diagonal  elements  of 
If  is  minimal.  This  occurs  when  the  product  of  the  diagonal  elements  is  equal 
to  the  determinant  of  If  ,  that  is  when  If  is  diagonal.  If  is  diagonalized  by 
pre-  and  post-  multiplying  by  an  orthonormal  matrix  whose  columns  are  the 
eigenvectors  of  If  .  Then  the  axes  of  the  histogram  are  oriented  in  the  direc¬ 
tion  of  the  eigenvectors  of  If  and  the  constant  bin  dimensions 
g  =  Cx  ,  h  =  C2  are  chosen  according  to  the  expressions  given  in  section 
2.2.  The  problem  of  orientation  of  general  meshes  is  beyond  the  scope  of 
this  paper. 

Simulations  were  performed  in  which  the  average  empirical  integrated 
mean  squared  error  was  obtained.  These  averages  were  slightly  better  than 
the  theoretical  IMSEs  but  always  higher  than  the  lower  bound  provided  by 
the  free  mesh.  The  averaging  of  100  simulations  using  the  circular  normal 
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distribution  produced  an  empirical  IMSE  of  3.179  for  the  regular  mesh,  2.992 
for  the  semiregular  mesh,  and  2.807  for  the  grid  mesh.  The  averaging  of  100 
simulations  using  density  3.4.1  with  p  =  .5,  P\  —  Pi  —  0,  crx  —  <r2  =  <t3  =*  <r4  = 
1,  nx  =  -1.5,  =  0,  —  1.5,  p,4  =  0,  produced  an  empirical  IMSE  of  1.928 

for  the  regular  mesh,  1.792  for  the  semiregular  mesh,  and  1.748  for  the  grid 
mesh.  Terrell  and  Scott  (1983)  obtained  empirical  results  for  the  one  dimen¬ 
sional  case  which  were  also  slightly  better  than  those  predicted  by  theory. 

A  set  of  Dirichlet  distributions  with  a  variety  of  parameters  was  studied 
in  order  to  observe  the  performance  of  the  different  meshes  with  skewed  dis¬ 
tributions.  One  of  these  is  illustrated  in  the  next  figure;  the  corresponding 
histograms  follow.  Tables  4.2.5  and  4.2.6  contain  the  theoretical  IMSEs  and 
relative  efficiencies.  Results  were  similar  in  general  to  the  results  from  the 
normal  distributions,  the  semigrid  mesh  showing  a  20%  improvement  over  the 


regular  mesh. 


JKm  f>ttt  ud  • 
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Figure  4.2.46.  Grid  histogram  (density  as  in  Figure  4.2.43,  sample  size  = 
2000). 


Figure  4.2.47.  Semigrid  histogram  (density  as  in  Figure  4.2.43,  sample  size  — 
2000). 
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Dirchlet 

sample  size  =  2000 

Parameters 

Theoretical  IMSE 

a 

mm 

mi 

H 

Grid 

Free 

3 

mi 

mi 

.2018 

.1938 

.1865 

.1612 

.1358 

5 

5 

5 

.3164 

.3072 

.2987 

.2650 

.2279 

3 

3 

10 

.4926 

.4325 

.3792 

.3497 

.3157 

B 

m 

m 

.4363 

.4249 

.4146 

.3698 

.3198 

8 

6 

8 

.4584 

.4448 

.4293 

.3855 

.3350 

10 

m 

.4948 

.4823 

.4653 

.4157 

.3607 

10 

5 

.4888 

.4801 

.4706 

.4139 

.3562 

4 

10 

m 

.4989 

.4866 

.4776 

.4172 

.3549 

10 

m 

m 

.5212 

.5114 

.5011 

.4456 

.3855 

10 

10 

.5548 

.5344 

.5058 

.4577 

.4003 

10 

mm 

5 

.5331 

.5268 

.5217 

.4582 

.3951 

3 

10 

3 

.5623 

.5436 

.5282 

.4524 

.3774 

10 

3 

3 

.5626 

.5373 

.5284 

.4483 

.3774 

10 

10 

10 

.6178 

.6028 

.5890 

.5277 

.4575 

10 

4 

4 

.4989 

.4881 

.4776 

.4155 

.3549 

10 

3 

10 

.6270 

.5970 

.5311 

.4843 

.4275 

4 

4 

10 

.4474 

.4099 

.3754 

.3439 

.3068 

10 

IB 

10 

.5720 

.5483 

.5087 

.4615 

.4047 

5 

IB 

10 

.4481 

.4197 

.3933 

.3584 

.3175 

5 

10 

5 

.4888 

.4783 

.4707 

.4152 

.3562 

Table  4.2.5. 
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Dirchlet 

sample  size  =  2000 

Parameters 

Theoretical  Efficiencies  relative  to  Free 

Grid 

a 

wm\ 

ni 

89MM 

Grid 

EBB55SB 

Free 

3 

mm 

HI 

.6729 

.7007 

.7282 

.8424 

1.000 

5 

5 

5 

.7203 

.7419 

.7630 

.8600 

1.000 

n 

n 

H 

.7330 

.7526 

.7713 

.8648 

1.000 

10 

10 

10 

.7405 

.7590 

.7767 

.8670 

1.000 

3 

3 

10 

.6409 

.7299 

.8325 

.9028 

1.000 

10 

3 

3 

.6708 

.7024 

.7142 

.8418 

1.000 

3 

10 

3 

.6712 

.6943 

.7145 

.8342 

1.000 

10 

3 

10 

.6818 

.7161 

.8049 

.8827 

1.000 

4 

4 

10 

.6857 

.7485 

.8173 

.8921 

1.000 

10 

4 

H 

.7114 

.7271 

.7431 

.8542 

1.000 

4 

10 

.7114 

.7293 

.7431 

.8507 

1.000 

10 

■a 

10 

.7075 

.7381 

.7956 

.8769 

1.000 

5 

10 

.7085 

.7565 

.8073 

.8859 

1.000 

10 

5 

5 

.7287 

.7419 

.7569 

.8606 

1.000 

5 

10 

5 

.7287 

.7447 

.7567 

.8579 

1.000 

10 

5 

10 

.7215 

.7491 

.7914 

.8746 

1.000 

10 

5 

B 

.7290 

.7479 

.7752 

.8677 

1.000 

10 

IB 

B 

.7411 

.7500 

.7573 

.8623 

1.000 

10 

B 

B 

.7396 

.7538 

.7693 

.8651 

1.000 

8 

6 

8 

.7308 

.7531 

.7803 

.8690 

1.000 

Table  4.2.6. 
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The  greatest  improvements  in  IMSE  made  possible  by  adaptive  meshes 
were  observed  for  mixed  normal  distributions  in  which  the  variances  differed 
widely.  It  was  already  noted  that  a  20%  improvement  can  be  expected  for 
mixed  normal  distributions  having  the  same  variance  as  well  as  for  unimodal 
normal  and  Dirichlet  distributions.  An  improvement  in  efficiency  of  over  90% 
may  be  obtained  when  the  variances  of  bimodal  normals  differ,  for  example 
when  (Xx  =  a2  =  .2,  cr3  =  <r4  =  3  (see  Tables  4.2.3  and  4.2.4). 
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5.  Application 

Scott  (1986)  provided  data  on  paired  cholesterol  and  triglyceride  levels 
in  320  patients  with  heart  disease.  It  is  assumed  that  these  levels  have  a 
bivariate  elliptical  normal  distribution  with  mean  vector  and  covariance 
matrix  estimated  from  the  sample.  Histograms  were  constructed  using  the 
four  mesh  types  discussed  in  previous  sections.  All  of  the  histograms  suggest 
a  bimodal  structure  in  which  the  modes  are  separated  by  less  than  2  o'  so 
that  the  bimodality  is  not  represented  by  separate  peaks  but  rather  by  an 
elongation  of  the  cells  in  the  y-direction. 


Figure  5.1.  Regular  histogram  based  upon  bivariate  elliptical  normal  data 
from  320  cardiology  patients;  normal  model  as  in  equation  3.4.1  with  p  =  1, 
pl  =  .226,  =  2.162,  oj  =  .4301,  /i2  =  1.794,  <r2  =  1.019. 


06 


Figure  5.2.  Grid  histogram  based  upon  bivariate  elliptical  normal  data  from 
320  cardiology  patients;  normal  model  as  in  equation  3.4.1  with  p  =  1,  px  — 
.226,  fix  =  2.162,  <rx  =  .4301,  /x2  =  1.794,  <r2  =  1.019. 


A  similar  structure  is  apparent  in  the  theoretical  cases  in  which  the  modes 
are  separated  by  2  cr.  It  is  impossible  using  our  methods  to  distinguish 
between  the  form  of  distributions  when  the  modes  are  between  2.2  and  2.7  <7 
apart  with  such  small  sample  sizes.  For  example,  we  were  not  able  to 
demonstrate  separate  peaks  with  our  histograms  at  a  sample  size  of  10,000 
when  the  modes  were  separated  by  2.4  c  even  though  the  true  functional 
form  had  two  clearly  defined  peaks.  See  Figures  5.3  and  5.4. 
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Figure  5.4.  Semiregular  histogram  (density  as  in  Figure  5.1  with  sample  size 
of  10,000).  The  Grid  and  Semigrid  histograms  constructed  from  the  same 
density  and  sample  size  were  similar  in  appearance  and  did  not  show  bimo¬ 
dality. 
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6.  Discussion 

Some  general  observations  may  be  made  regarding  the  implications  of 
our  research.  Specifically,  we  should  indicate  under  what  conditions 
variable-dimension  meshes  are  preferable  to  fixed-dimension  meshes.  When¬ 
ever  the  density  consists  of  distinct  sections  whose  forms  are  substantially 
different,  such  as  is  the  case  for  a  mixed  bivariate  normal  density  having 
unequal  variances,  we  strongly  recommend  the  use  of  variable-dimension 
meshes.  In  the  case  of  densities  whose  forms  approach  that  of  the  circular 
normal  and  where  the  choice  of  mesh  will  not  result  in  substantial  decreases 
in  error,  the  investment  of  time  and  effort  relative  to  improvement  of  esti¬ 
mate  should  be  considered.  Although  the  Grid  mesh  will  not  produce  as 
efficient  an  estimator  as  the  Semigrid  it,  nevertheless,  may  be  preferable  to 
the  Semigrid  because  of  its  ease  of  implementation  and  symmetry. 

In  the  case  of  a  highly  elliptical  density  whose  major  axis  is  not  approxi¬ 
mately  parallel  to  one  of  the  coordinate  axes,  no  improvement  will  result 
from  the  use  of  variable  dimension  meshes.  In  order  to  obtain  better  esti¬ 
mates  in  this  case,  a  change  of  coordinates  should  be  made  so  that  the  major 
axis  of  the  density  is  parallel  to  one  of  the  coordinate  axes,  and  only  then 
should  one  of  the  more  efficient  meshes  be  constructed.  Further  research  in 


mesh  orientation  is  recommended. 
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Since  the  histogram  estimator,  even  one  based  upon  variable-dimension 
mesh,  may  not  adequately  represent  bimodal  densities  in  which  the  distance 
between  the  modes  is  small,  another  type  of  estimator  such  as  a  variety  of 
the  Rosenblatt  kernel  may  be  preferred.  Again,  the  importance  of  accuracy 
and  the  level  of  sophistication  must  be  balanced  in  each  application. 


101 


REFERENCES 


Apostol,  T.  M.  (1964).  Mathematical  Analysis.  Addison-Wesley,  London. 

Bean,  S.  J.,  Tsokos,  C.  P.  (1980).  Developements  in  Nonparametric  Density 
Estimation.  Internat.  Statist.  Ref.,  48,  267. 

Breiman,  L.,  Meisel,  W.,  Purcell,  E.  (1977).  Variable  Kernel  Estimates  of 
Multivariate  Densities.  Technometrics,  19,  135. 

fcencov,  N.  N.  (1962).  Evaluation  of  an  Unknown  Density  Distribution  from 
Observations.  Soviet  Math.,  3,  1559. 

David,  H.  A.  (1981).  Order  Statistics.  Wiley,  New  York. 

Epanechnikov,  V.  A.  (1969).  Nonparametric  Estimator  of  a  Multivariate  Pro¬ 
bability  Density.  Theor.  Prob.  Appl.,  14,  153. 

Freedman,  D.,  Diaconis,  P.  (1981).  On  the  Histogram  as  a  Density  Estimator: 
L2  Theory.  Z.  Wahrsch.  Verw.  Gebiete,  57,  453. 

Freedman,  D.,  Diaconis,  P.  (1981).  On  the  Maximum  Deviation  Between  the 
Histogram  and  the  Underlying  Density.  Z.  Wahrsch.  Verw.  Gebiete,  58,  139. 

Hand,  D.  J.  (1981).  Discrimination  and  Classification.  Wiley,  New  York. 

IMSL  (1984).  International  Mathematical  and  Statistical  Libraries.  Houston, 
Texas. 

Kendall,  M.  G.,  Stuart,  A.  (1969).  The  Advanced  Theory  of  Statistics,  Vol.  I, 
II,  III  Griffin,  London. 

Nezames,  D.  D.,  Scott,  D.  W.  (1980).  Some  Results  for  Estimating  Bivariate 
Densities  using  Kernel,  Orthogonal  Series,  and  Penalized  Likelihood  Methods. 
Doctoral  Dissertation,  Rice  University,  Houston,  Texas. 

Olmstead,  J.  M.  H.  (1956).  Real  Variables.  Appleton-Century-Crofts,  New 
York. 


102 


Parzen,  E.  (1962).  On  Estimation  of  a  Probability  Density  Function  and 
Mode.  Ann.  Math.  Statist.,  33,  1065. 

Rosenblatt,  M.  (1965).  Remarks  on  Some  Nonparametric  Estimates  of  a  Den¬ 
sity  Function.  Ann.  Math.  Statist.,  27,  832. 

Scott,  D.  W.  (1976).  Nonparametric  Probability  Density  Estimation  by 
Optimization  Theoretic  Techniques.  Doctoral  Dissertation,  Rice  University, 
Houston,  Texas. 

Scott,  D.  W.  (1979).  On  Optimal  and  Data-Based  Histograms.  Biometrika, 
66,  605. 

Scott,  D.  W.  (1982).  Optimal  Meshes  for  Histograms  Using  Variable- Width 
Bins.  Paper  presented  to  the  annual  meeting  of  the  American  Statistical 
Association. 

Scott,  D.  W.  (1985).  A  Note  on  Choice  of  Bivariate  Histogram  Bin  Shape. 
Unpublished  manuscript. 

Scott,  D.  W.  (1986).  Histogram  Proof  a  la  Frequency  Polygon.  Unpublished 
manuscript. 

Tapia,  R.  A.,  Thompson,  J.  R.  (1978).  Nonparametric  Probability  Density 
Estimation.  John  Hopkins  University  Press,  Baltimore. 

Tarter,  M.  E.,  Kronmal,  R.  A.  (1970).  On  Multivariate  Density  Estimates 
Based  on  Orthogonal  Expansions.  Ann.  Math.  Statist.,  41,  718. 

Tarter,  M.  E.,  Kronmal,  R.  A.  (1976).  An  Introduction  to  the  Implementation 
and  Theory  of  Nonparametric  Density  Estimation.  Amer.  Statist.,  30,  105. 

Terrell,  G.  R.,  Scott,  D.  W.  (1983).  Variable  Windows  Density  Estimates. 
Unpublished  manuscript. 

Terrell,  G.  R.  (1983).  Bivariate  Histograms.  Unpublished  manuscript. 

Terrell,  G.  R.  (1984).  Orienting  Regular  Rectangular  Histograms.  Unpub¬ 
lished  manuscript. 

Terrell,  G.  R.  (1986).  Multivariate  Histograms.  Unpublished  manuscript. 


